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Abstract 

The formula-evaluation problem is denned recursively. A formula's evaluation is the eval- 
uation of a gate, the inputs of which are themselves independent formulas. Despite this pure 
recursive structure, the problem is combinatorially difficult for classical computers. 

A quantum algorithm is given to evaluate formulas over any finite boolean gate set. Provided 
that the complexities of the input subformulas to any gate differ by at most a constant factor, 
the algorithm has optimal query complexity. After efficient preprocessing, it is nearly time 
optimal. The algorithm is derived using the span program framework. It corresponds to the 
composition of the individual span programs for each gate in the formula. Thus the algorithm's 
structure reflects the formula's recursive structure. 

1 Introduction 

A fc-bit gate is a function / : {0, l} k — ► {0, 1}. A formula (p over a set of gates S is a rooted tree 
in which each node with k children is associated to a fc-bit gate from S, for k = 1, 2, . . .. Any such 
tree with n leaves naturally defines a function p> : {0,1}" — ► {0,1}, by placing the input bits on 
the leaves in a fixed order and evaluating the gates recursively toward the root. Such functions are 
often called read-once formulas, as each input bit is associated to one leaf only. 

The formula-evaluation problem is to evaluate a formula <p over S on an input x G {0, l} n . The 
formula is given, but the input string x must be queried one bit at a time. How many queries to x 
are needed to compute (p(x)7 We would like to understand this complexity as a function of S and 
asymptotic properties of ip. Roughly, larger gate sets allow <p to have less structure, which increases 
the complexity of evaluating p. Another important factor is often the balancedness of the tree ip. 
Unbalanced formulas often seem to be more difficult to evaluate. 

For applications, the most important gate set consists of all AND and OR gates. Formulas over 
this set are known as AND- OR formulas. Evaluating such a formula solves the decision version of a 
MIN-MAX tree, also known as a two-player game tree. Unfortunately, the complexity of evaluating 
formulas, even over this limited gate set, is unknown, although important special cases have been 
solved. The problem over much larger gate sets appears to be combinatorially intractable. For 
some formulas, it is known that "non-directional" algorithms that do not work recursively on the 
structure of the formula perform better than any recursive procedure. 

In this article, we show that the formula-evaluation problem becomes dramatically simpler 
when we allow the algorithm to be a bounded-error quantum algorithm, and allow it coherent 
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Table 1: Comparison of some classical and quantum query complexity results for formula evaluation. 
Here S is any fixed, finite gate set, and the exponent a is given by a = log 2 ( 1 +^ / ^ ) ~ 0.753. Under 
certain assumptions, the algorithms' running times are only poly-logarithmically slower. 

query access to the input string x. Fix S to be any finite set of gates. We give an optimal quantum 
algorithm for evaluating "almost-balanced" formulas over S. The balance condition states that 
the complexities of the input subformulas to any gate differ by at most a constant factor, where 
complexity is measured by the general adversary bound Adv+. In general, Adv+ is the value of an 
exponentially large semi-definite program (SDP). For a formula ip with constant-size gates, though, 
Adv+((^) can be computed efficiently by solving constant-size SDPs for each gate. 

To place this work in context, some classical and quantum results for evaluating formulas are 
summarized in Table 1 . The stated upper bounds are on query complexity and not time complexity. 
However, for the OR n and balanced AND2-OR2 formulas, the quantum algorithms' running times 
are only slower by a poly-logarithmic factor. For the other formulas, the quantum algorithms' 
running times are slower by a poly-logarithmic factor provided that: 

1. A polynomial-time classical preprocessing step, outputting a string s(<p), is not charged for. 

2. The algorithms are allowed unit-cost coherent access to s(tp). 

Our algorithm is based on the framework relating span programs and quantum algorithms 
from [Rei09a] . Previous work has used span programs to develop quantum algorithms for evaluating 
formulas [RS08]. Using this and the observation that the optimal span program witness size for 
a boolean function / equals the general adversary bound Adv+(/), Ref. [Rei09a] gives an optimal 
quantum algorithm for evaluating "adversary-balanced" formulas over an arbitrary finite gate set. 
The balance condition is that each gate's input subformulas have equal general adversary bounds. 

In order to relax this strict balance requirement, we must maintain better control in the recursive 
analysis. To help do so, we define a new span program complexity measure, the "full witness 
size." This complexity measure has implications for developing time- and query-efficient quantum 
algorithms based on span programs. Essentially, using a second result from [Rei09a], that properties 
of eigenvalue-zero eigenvectors of certain bipartite graphs imply "effective" spectral gaps around 
zero, it allows quantum algorithms to be based on span programs with free inputs. This can simplify 
the implementation of a quantum walk on the corresponding graph. 
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Besides allowing a relaxed balance requirement, our approach has the additional advantage of 
making the constants hidden in the big-O notation more explicit. The formula-evaluation quantum 
algorithms in [RS08, Rei09a] evaluate certain formulas ip using 0(Adv ± (v?)) queries, where the 
hidden constant depends on the gates in S in a complicated manner. It is not known how to upper- 
bound the hidden constant in terms of, say, the maximum fan-in A; of a gate in S. In contrast, the 
approach we follow here allows bounding this constant by an exponential in k. 

It is known that the general adversary bound is a nearly tight lower bound on quantum query 
complexity for any boolean function [Rei09a], including in particular boolean formulas. However, 
this comes with no guarantees on time complexity. The main contribution of this paper is to 
give a nearly time-optimal algorithm for formula evaluation. The algorithm is also tight for query 
complexity, removing the extra logarithmic factor from the bound in [Rei09a]. 

Additionally, we apply the same technique to study AND-OR formulas. For this special case, 
special properties of span programs for AND and for OR gates allow the almost-balance condition 
to be significantly weakened. Ambainis et al. [ACR + 07] have studied this case previously. By 
applying the span program framework, we identify a slight weakness in their analysis. Tightening 
the analysis extends the algorithm's applicability to a broader class of AND-OR formulas. 

A companion paper [Rei09b] applies the span program framework to the problem of evaluating 
arbitrary AND-OR formulas. By studying the full witness size for span programs constructed 
using a novel composition method, it gives an (^(-^/relog n)-query quantum algorithm to evaluate a 
formula of size n, for which the time complexity is poly-logarithmically worse after preprocessing. 
This nearly matches the f2(i/n) lower bound, and improves a y/n2 °(v*>s*0- query quantum algorithm 
from [ACR + 07]. Ref. [Rei09b] shares the broader motivation of this paper, to study span program 
properties and design techniques that lead to time-efficient quantum algorithms. 

Sections 1.1 and 1.2 below give further background on the formula-evaluation problem, for clas- 
sical and quantum algorithms. Section 1.3 precisely states our main theorem, the proof of which is 
given in Section 3 after some background on span programs. The theorem for approximately bal- 
anced AND-OR formulas is stated in Section 1.4, and proved in Section 4. An appendix revisits the 
proof from [ACR + 07] to prove our extension directly, without using the span program framework. 

1.1 History of the formula-evaluation problem for classical algorithms 

For a function / : {0, l} n — ► {0,1}, let D(f) be the least number of input bit queries sufficient 
to evaluate / on any input with zero error. D (/) is known as the deterministic decision-tree 
complexity of /, or the deterministic query complexity of /. Let the randomized decision-tree 
complexity of /, R(f) < D(f), be the least expected number of queries required to evaluate / 
with zero error (i.e., by a Las Vegas randomized algorithm). Let the Monte Carlo decision-tree 
complexity, i?2(/) = 0(R(f)), be the least number of queries required to evaluate / with error 
probability at most 1/3 (i.e., by a Monte Carlo randomized algorithm). 

Classically, formulas over the gate set S = {NAND^ : k G N} have been studied most exten- 
sively, where NANDfc(xi, . . . ,Xk) = 1 — ITj=i x r By De Morgan's rules, any formula over NAND 
gates can also be written as a formula in which the gates at an even distance from the formula's 
root are AND gates and those an odd distance away are OR gates, with some inputs or the output 
possibly complemented. Thus formulas over S are also known as AND-OR formulas. 

For any AND-OR formula ip of size n, i.e., on n inputs, D(cp) = n. However, randomization 
gives a strict advantage; R(<p) and R2(f) can be strictly smaller. Indeed, let (fd be the complete, 
binary AND-OR formula of depth d, corresponding to the tree in which each internal vertex has 
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two children and every leaf is at distance d from the root, with alternating levels of AND and OR 
gates. Its size is n = 2 d . Snir [Sni85] has given a randomized algorithm for evaluating ifd using 
in expectation 0{n a ) queries, where a = log 2 ( ~ 0.753 [SW86]. This algorithm, known as 
randomized alpha-beta pruning, evaluates a random subformula recursively, and only evaluates the 
second subformula if necessary. Saks and Wigderson [SW86] have given a matching lower bound on 
R^cfd), which Santha has extended to hold for Monte Carlo algorithms, R2(fd) = &(n a ) [San95]. 

Thus the query complexities have been characterized for the complete, binary AND- OR formu- 
las. In fact, the tight characterization works for a larger class of formulas, called "well balanced" 
formulas by [San95]. This class includes, for example, alternating AND2-OR2 formulas where for 
some d every leaf is at depth d or d — 1, Fibonacci trees and binomial trees [SW86] . It also includes 
skew trees, for which the depth is the maximal n — 1. 

For arbitrary AND-OR formulas, on the other hand, little is known. It has been conjectured that 
complete, binary AND-OR formulas are the easiest to evaluate, and that in particular R((p) = 0(n Q ) 
for any size-n AND-OR formula ip [SW86]. However, the best general lower bound is R(<p) = 
0(n°' 51 ), due to Heiman and Wigderson [HW91]. Ref. [HW91] also extends the result of [SW86] to 
allow for AND and OR gates with fan-in more than two. 

It is perhaps not surprising that formulas over most other gate sets S are even less well under- 
stood. For example, Boppana has asked the complexity of evaluating the complete ternary majority 
(MAJ3) formula of depth d [SW86], and the best published bounds on its query complexity are 
n((7/3) d ) and 0((2.6537 . . .) d ) [JKS03]. In particular, the naive, "directional," generalization of 
the randomized alpha-beta pruning algorithm is to evaluate recursively two random immediate sub- 
formulas and, if they disagree, then also the third. This algorithm uses 0((8/3) d ) expected queries, 
and is suboptimal. This suggests that the complete MAJ3 formulas are significantly different from 
the complete AND-OR formulas. 

Heiman, Newman and Wigderson have considered read-once threshold formulas in an attempt 
to separate the complexity classes TC° from NC 1 [HNW93]. That is, they allow the gate set to 
be the set of Hamming- weig ht threshold gates {T^ : m, k G N} defined by : {0, l} k -> {0, 1}, 
T^(x) = 1 if and only if the Hamming weight of x is at least m. AND, OR and majority gates 
are all special cases of threshold gates. Heiman et al. prove that R((p) > n/2 d for cp a threshold 
formula of depth d, and in fact their proof extends to gate sets in which every gate "contains a 
flip" [HNW93]. This implies that a large depth is necessary for the randomized complexity to be 
much lower than the deterministic complexity. 

Of course there are some trivial gate sets for which the query complexity is fully understood, for 
example, the set of parity gates. Overall, though, there are many more open problems than results. 
Despite its structure, formula evaluation appears to be combinatorially complicated. However, 
there is another approach, to try to leverage the power of quantum computers. Surprisingly, the 
formula-evaluation problem simplifies considerably in this different model of computation. 

1.2 History of the formula-evaluation problem for quantum algorithms 

In the quantum query model, the input bits can be queried coherently. That is, the quantum 
algorithm is allowed unit-cost access to the unitary operator O x , called the input oracle, defined by 

O x : \<p) © \j) <g> 1 6) t-> \ip) <g> \j) <g> \b © Xj) . (1.1) 

Here \<p) is an arbitrary pure state, {\j) : j = 1,2,..., re} is an orthonormal basis for C n , 
{\b) : b = 0, 1} is an orthonormal basis for C 2 , and © denotes addition mod two. O x can be 
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implemented efficiently on a quantum computer given a classical circuit that computes the func- 
tion j i — > Xj [NCOO]. For a function / : {0, l} n — > {0, 1}, let Q{f) be the number of input queries 
required to evaluate / with error probability at most 1/3. It is immediate that Q{f) < i?2(/)- 

Research on the formula-evaluation problem in the quantum model began with the n-bit OR 
function, OR n . Grover gave a quantum algorithm for evaluating OR n with bounded one-sided error 
using 0{^Jn) oracle queries and 0{y/n log log n) time [Gro96, Gro02]. In the classical case, on the 
other hand, it is obvious that i?2(OR n ), i?(OR n ) and D{OK n ) are all 0(n). 

Grover's algorithm can be applied recursively to speed up the evaluation of more general AND- 
OR formulas. Call a formula layered if the gates at the same depth are the same. Buhrman, Cleve 
and Wigderson show that a layered, depth-d, size-re AND-OR formula can be evaluated using 
0(^/n\og d ~ l n) queries [BCW98]. The logarithmic factors come from using repetition at each level 
to reduce the error probability from a constant to be polynomially small. 

H0yer, Mosca and de Wolf [HMW03] consider the case of a unitary input oracle O x that maps 

O x : \(p) ® \j) ® \b) ® |0> h+ |<p) ® \j) (8) (\bexj) ® \ip x ,j, Xj ) + \bexj) ® \^ x j, Xj )) , (1-2) 

where \tp x ,j, Xj ), \il> x ,j, Xj ) are pure states with Hl^asj,^)!! 2 > 2/3. Such an oracle can be implemented 
when the function j ^ Xj is computed by a bounded-error, randomized subroutine. H0yer et al. 
allow access to O x and (D~ , both at unit cost, and show that OR„ can still be evaluated using 
0(y/n) queries. This robustness result implies that the logn steps of repetition used by [BCW98] 
are not necessary, and a depth-d layered AND-OR formula can be computed in 0(y/nc d ~ 1 ) queries, 
for some constant c > 1000. If the depth is constant, this gives an 0(y / n)-query quantum algorithm, 
but the result is not useful for the complete, binary AND-OR formula, for which d = log 2 n. 

In 2007, Farhi, Goldstone and Gutmann presented a quantum algorithm for evaluating com- 
plete, binary AND-OR formulas [FGG07]. Their breakthrough algorithm is not based on iterating 
Grover's algorithm in any way, but instead runs a quantum walk — analogous to a classical ran- 
dom walk — on a graph based on the formula. The algorithm runs in time 0(y/n) in a certain 
continuous-time query model. 

Ambainis et al. discretized the [FGG07] algorithm by reinterpreting a correspondence between 
(discrete-time) random and quantum walks due to Szegedy [Sze04] as a correspondence between 
continuous-time and discrete-time quantum walks [ACR + 07]. Applying this correspondence to 
quantum walks on certain weighted graphs, they gave an 0(y / n)-query quantum algorithm for 
evaluating "approximately balanced" AND-OR formulas. For example, MAJ3(xi, X2, x%) = (x± A 
^2) V ((xi V X2) A £3), so there is a size-S^ AND-OR formula that computes MAJ3 d the complete 
ternary majority formula of depth d. Since the formula is approximately balanced, Q(MAJ3 rf ) = 
0(\/5 ), better than the 0((7/3) d ) classical lower bound. 

The [ACR + 07] algorithm also applies to arbitrary AND-OR formulas. If (p has size n and 
depth d, then the algorithm, applied directly, evaluates ip using 0(^/nd) queries. 1 This can be as 
bad as (9(n 3 / 2 ) if the depth is d = n. However, Bshouty, Cleve and Eberly have given a formula 
rebalancing procedure that takes AND-OR formula <p as input and outputs an equivalent AND-OR 
formula ip' with depth d! = 2°(v / I^) and size ri = n 2°i^ n ) [BCE91, BB94]. The formula ip' 
can then be evaluated using queries. 

Our understanding of lower bounds for the formula-evaluation problem progressed in parallel 

Actually, [ACR+07, Sec. 7] only shows a bound of 0(^/nd 3/2 ) queries, but this can be improved to 0(^/nd) 
using the bounds on a±(<p) below [ACR + 07, Def. 1]. 
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to this progress on quantum algorithms. There are essentially two techniques, the polynomial and 
adversary methods, for lower-bounding quantum query complexity. 



• The polynomial method, introduced in the quantum setting by Beals et al. [BBC + 01], is based 
on the observation that after making q oracle O x queries, the probability of any measurement 
result is a polynomial of degree at most 2q in the variables Xj. 

• Ambainis generalized the classical hybrid argument, to consider the system's entanglement 
when run on a superposition of inputs [Amb02]. A number of variants of Ambainis's bound 
were soon discovered, including weighted versions [HNS02, BS04, Amb06, Zha05], a spectral 
version [BSS03], and a version based on Kolmogorov complexity [LM04]. These variants can 
be asymptotically stronger than Ambainis's original unweighted bound, but are equivalent to 
each other [SS06]. We therefore term it simply "the adversary bound," denoted by Adv. 

The adversary bound is well-suited for lower-bounding the quantum query complexity for eval- 
uating formulas. For example, Barnum and Saks proved that for any size-n AND-OR formula ip, 
Adv(ip) = y/n, implying the lower bound Q(f) = £l(y/n) [BS04]. Thus the [ACR + 07] algorithm is 
optimal for approximately balanced AND-OR formulas, and is nearly optimal for arbitrary AND- 
OR formulas. This is a considerably more complete solution than is known classically. 

It is then natural to consider formulas over larger gate sets. The adversary bound continues to 
work well, because it transforms nicely under function composition: 



Theorem 1.1 (Adversary bound composition [Amb06, LLS06, HLS05]). Let f : {0, l} k -> {0,1} 
and let fj : {0, l} m J -» {0, 1} for j = 1, 2, . . . , k. Define g : {0, l} mi x • • • x {0, l} mfc -> {0, 1} by 
g(x) = f{f 1 (x 1 ),...J k (x k )). Lcts = (Adv(/i),...,Adv(/ fc )). Then 



See Definition 2.1 for the definition of the adversary bound with "costs," Adv s . The Adv bound 
equals Adv s with uniform, unit costs s = 1 . For a function /, Adv(/) can be computed using a 
semi-definite program in time polynomial in the size of /'s truth table. Therefore, Theorem 1.1 
gives a polynomial-time procedure for computing the adversary bound for a formula <p over an 
arbitrary finite gate set: compute the bounds for subformulas, moving from the leaves toward the 
root. At an internal node /, having computed the adversary bounds for the input subformulas 
/i, . . . Eq. (1.3) says that the adversary bound for g, the subformula rooted at /, equals the 
adversary bound for the gate f with certain costs. Computing this requires 2°^ time, which is a 

constant if k = 0(1). For example, if / is an OR k or AND^ gate, then Adv( si) ... )Sfc )(/) = yjYlj s ] > 

from which follows immediately the [BS04] result Adv(</?) = ^/n for a size-n AND-OR formula ip. 

A special case of Theorem 1.1 is when the functions fj all have equal adversary bounds, so 
Adv(g) = Adv(/)Adv(/i). In particular, for a function / : {0, l} k — > {0, 1} and a natural number 
d € N, let f d : {0, l} fcd — > {0, 1} denote the complete, depth-d formula over /. That is, f 1 = f and 
f d (x) = f(f d ~ 1 (xi, . . . , x k d-i), . . . , f d ~ 1 (x k d_ k d~i +1 , . . . , x k d)) for d > 1. Then we obtain: 

Corollary 1.2. For any function f : {0, l} k {0,1}, 



Adv( 5 ) = Adv s (/) . 



(1.3) 



Adv(/ d ) 



Adv(/) d . 



(1.4) 
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In particular, Ambainis denned a boolean function / : {0, l} 4 — > {0, 1} that can be represented 
exactly by a polynomial of degee two, but for which Adv(/) = 5/2 [Amb06]. Thus / can be 
represented exactly by a polynomial of degree 2 d , but by Corollary 1.2, Adv(/ rf ) = (5/2) rf . For this 
function, the adversary bound is strictly stronger than any bound obtainable using the polynomial 
method. Many similar examples are given in [HLS06] . However, for other functions, the adversary 
bound is asymptotically worse than the polynomial method [SS06, AS04, Amb05]. 

In 2007, though, H0yer et al. discovered a strict generalization of Adv that also lower-bounds 
quantum query complexity [HLS07]. We call this new bound the general adversary bound, or Adv 1 * 1 . 
For example, for Ambainis's four-bit function /, Adv =t (/) > 2.51 [HLS06]. Like the adversary 
bound, Adv±(/) can be computed in time polynomial in the size of /'s truth table, and also 
composes nicely: 

Theorem 1.3 ([HLS07, Rei09a]). Under the conditions of Theorem 1.1, 

Adv ± ( 5 )=Adv±(/) . (1.5) 

In particular, i/Adv ± (/i) = ••• = Adv ± (/ fc ) ; then KAv ± {g) = Adv ± (/) Adv ± (/i). 

Define a formula <p to be adversary balanced if at each internal node, the general adversary 
bounds of the input subformulas are equal. In particular, by Theorem 1.3 this implies that Adv =t ((/9) 
is equal to the product of the general adversary bounds of the gates along any path from the root 
to a leaf. Complete, layered formulas are an example of adversary-balanced formulas. 

Returning to upper bounds, Reichardt and Spalek [RS08] generalized the algorithmic approach 
started by [FGG07]. They gave an optimal quantum algorithm for evaluating adversary-balanced 
formulas over a considerably extended gate set, including in particular all functions {0, l} fc — ► {0,1} 
for k < 3, 69 inequivalent four-bit functions, and the gates AND^, ORk, PARITY £. and EQUAL fc , 
for k = O(l). For example, Q(MAJ 3 d ) = @(2 d ). 

The [RS08] result follows from a framework for developing formula-evaluation quantum algo- 
rithms based on span programs. A span program, introduced by Karchmer and Wigderson [KW93] , 
is a certain linear-algebraic way of defining a function, which corresponds closely to eigenvalue-zero 
eigenvectors of certain bipartite graphs. [RS08] derived a quantum algorithm for evaluating certain 
concatenated span programs, with a query complexity upper-bounded by the span program witness 
size, denoted wsize. In particular, a special case of [RS08, Theorem 4.7] is: 

Theorem 1.4 ([RS08]). Fix a function f : {0, l} k -> {0, 1}. // span program P computes f , then 

Q(f d ) = 0(wsize(P) d ) . (1.6) 

From Theorem 1.3, this result is optimal if wsize(P) = Adv =t: (/). The question therefore 
becomes how to find optimal span programs. Using an ad hoc search, [RS08] found optimal span 
programs for a variety of functions with Adv 1 * 1 = Adv. Further work automated the search, by 
giving a semi-definite program (SDP) for the optimal span program witness size for any given 
function [Rei09a]. Remarkably, the SDP's value always equals the general adversary bound: 

Theorem 1.5 ([Rei09a]). For any function f : {0, 1}™ -> {0, 1}, 

infwsize(P) = Adv ± (/) , (1.7) 

where the infimum is over span programs P computing f . Moreover, this infimum is achieved. 
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This result greatly extends the gate set over which the formula-evaluation algorithm of [RS08] 
works optimally. For example, combined with Theorem 1.4, it implies that lim<i— ►oo Q(f d ) 1 ^ d = 
Adv ± (/) for every boolean function /. More generally, Theorem 1.5 allows the [RS08] algorithm 
to be run on formulas over any finite gate set S. A factor is lost that depends on the gates in S, 
but it will be a constant for S finite. Combining Theorem 1.5 with [RS08, Theorem 4.7] gives: 

Theorem 1.6 ([Rei09a]). Let S be a finite set of gates. Then there exists a quantum algorithm that 
evaluates an adversary-balanced formula (p over S using 0(Adv ± ((^)) input queries. After efficient 
classical preprocessing independent of the input x, and assuming unit-time coherent access to the 
preprocessed classical string, the running time of the algorithm is Adv^ ((/?)( log Adv 1 * 1 (</?)) 

In the discussion so far, we have for simplicity focused on query complexity. The query com- 
plexity is an information-theoretic quantity that does not charge for operations independent of the 
input string, even though these operations may require many elementary gates to implement. For 
practical algorithms, it is important to be able to bound the algorithm's running time, which counts 
the cost of implementing the input-independent operations. Theorem 1.6 puts an optimal bound 
on the query complexity, and also puts a nearly optimal bound on the algorithm's time complexity. 
In fact, all of the query-optimal algorithms so far discussed are also nearly time optimal. 

In general, though, an upper bound on the query complexity does not imply an upper bound 
on the time complexity. Ref. [Rei09a] also generalized the span program framework of [RS08] to 
apply to quantum algorithms not based on formulas. The main result of [Rei09a] is: 

Theorem 1.7 ([Rei09a]). For any function f : V — > {1,2, . . . ,m}, with V C {0, l} n , Q(f) satisfies 

Q(/) = fi(Adv±(/)) and Q(f) = O ( Adv^/) /°f A ^ log(m) log log m) . (1.8) 

V loglogAdv =t (/) ) 

Theorem 1.7 in particular allows us to compute the query complexity of formulas, up to the 
logarithmic factor. It does not give any guarantees on running time. However, the analysis required 
to prove Theorem 1.7 also leads to significantly simpler proofs of Theorem 1.6 and the AND-OR 
formula results of [ACR + 07, FGG07]. Moreover, we will see that it allows the formula-evaluation 
algorithms to be extended to formulas that are not adversary balanced. 

1.3 Quantum algorithm for evaluating almost-balanced formulas 

We give a formula-evaluation algorithm that is both query-optimal, without a logarithmic overhead, 
and, after an efficient preprocessing step, nearly time optimal. Define almost balance as follows: 

Definition 1.8. Consider a formula ip over a gate set S. For a vertex v in the corresponding 
tree, let <p v denote the subformula of ip rooted at v, and, if v is an internal vertex, let g v be the 
corresponding gate. The formula ip is /3-balanced if for every vertex v, with children c\, C2, • • • , cj~, 



maxj Adv ± (<p Cj ) 



<(3. (1.9) 



minj Adv ((p c 

(If Cj is a leaf, Adv =t ((/? Cj ) = 1.) Formula ip is almost balanced if it is j3 -balanced for some = 0(1). 
In particular, an adversary-balanced formula is 1-balanced. We will show: 
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Theorem 1.9. Let S be a fixed, finite set of gates. Then there exists a quantum algorithm that 
evaluates an almost-balanced formula ip over S using 0[Adv^(ip)^ input queries. After polynomial- 
time classical preprocessing independent of the input, and assuming unit-time coherent access to 
the preprocessed string, the running time of the algorithm is Adv ± ((/9) (log Adv^y?)) ^. 

Theorem 1.9 is significantly stronger than Theorem 1.6, which requires exact balance. There 
are important classes of exactly balanced formulas, such as complete, layered formulas. In fact, it is 
sufficient that the multiset of gates along the simple path from the root to a leaf not depend on the 
leaf. Moreover, sometimes different gates have the same Adv 1 * 1 bound; see [HLS06] for examples. 
Even still, exact adversary balance is a very strict condition. 

The proof of Theorem 1.9 is based on the span program framework developed in Ref. [Rei09a]. 
In particular, [Rci09a, Theorem 9.1] gives two quantum algorithms for evaluating span programs. 
The first algorithm is based on a discrete-time simulation of a continuous-time quantum walk. 
It applies to arbitrary span programs, and is used, in combination with Theorem 1.5, to prove 
Theorem 1.7. However, the simulation incurs a logarithmic query overhead and potentially worse 
time complexity overhead, so this algorithm is not suitable for proving Theorem 1.9. 

The second algorithm in [Rei09a] is based directly on a discrete-time quantum walk, similar 
to previous optimal formula-evaluation algorithms [ACR + 07, RS08]. However, this algorithm does 
not apply to an arbitrary span program. A bound is needed on the operator norm of the entry-wise 
absolute value of the weighted adjacency matrix for a corresponding graph. Further graph sparsity 
conditions are needed for the algorithm to be time efficient (see Theorem 2.4). 

Unfortunately, the span program from Theorem 1.5 will not generally satisfy these conditions. 
Theorem 1.5 gives a canonical span program ([Rei09a, Def. 5.1]). Even for a simple formula, the 
optimal canonical span program will typically correspond to a dense graph with large norm. 

An example should clarify the problem. Consider the AND-OR formula ip(x) = ([(xi A X2) V 
£3] A xA V (X5 A [xq V £7]), and consider the two graphs in Figure 1. For an input x £ {0, l} 7 , 
modify the graphs by attaching dangling edges to every vertex j for which Xj = 0. Observe then 
that each graph has an eigenvalue-zero eigenvector supported on vertex — called a witness — if and 
only if ip{x) = 1. The graphs correspond to different span programs computing ip, and the quantum 
algorithm works essentially by running a quantum walk starting at vertex in order to detect the 
witness. The graph on the left is a significantly simplified version of a canonical span program for 
ip, and its density still makes it difficult to implement the quantum walk. 

We will be guided by the second, simpler graph. Instead of applying Theorem 1.5 to (p as a 
whole, we apply it separately to every gate in the formula. We then compose these span programs, 
one per gate, according to the formula, using direct-sum composition (Definition 2.5). In terms of 
graphs, direct-sum composition attaches the output vertex of one span program's graph to an input 
vertex of the next [RS08] . This leads to a graph whose structure somewhat follows the structure of 
the formula cp, as the graph in Figure 1(b) follows the structure of t/j. (However, the general case 
will be more complicated than shown, as we are plugging together constant-size graph gadgets, and 
there may be duplication of some subgraphs.) 

Direct-sum composition keeps the maximum degree and norm of the graph under control — each 
is at most twice its value for the worst single gate. Therefore the second [Rei09a] algorithm applies. 
However, direct-sum composition also leads to additional overhead. In particular, a witness in the 
first graph will be supported only on numbered vertices (note that the graph is bipartite) , whereas 
a witness in the second graph will be supported on some of the internal vertices as well. This means 
roughly that the second witness will be harder to detect, because after normalization its overlap on 
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(a) (b) 
Figure 1: Graphs corresponding to two span programs both computing the same function. 

vertex will be smaller. Scale both witnesses so that the amplitude on vertex is one. The witness 
size (wsize) measures the squared length of the witness only on numbered vertices, whereas the full 
witness size (fwsize) measures the squared length on all vertices. For [Rei09a], it was sufficient to 
consider only span program witness size, because for canonical span programs like in Figure 1(a) 
the two measures are equal. (For technical reasons, we will actually define fwsize to be 1 + wsize 
even in this case.) For our analysis, we will need to bound the full witness size in terms of the 
witness size. We maintain this bound in a recursion from the formula's leaves toward its root. 

A span program is called strict if every vertex on one half of the bipartite graph is either an 
input vertex (vertices 1-7 in the graphs of Figure 1) or the output vertex (vertex 0). Thus the first 
graph in the example above corresponds to a strict span program, and the second does not. The 
original definition of span programs, in [KW93], allowed for only strict span programs. This was 
sensible because any other vertices on the input/output part of the graph's bipartition can always 
be projected away, yielding a strict span program that computes the same function. For developing 
time-efficient quantum algorithms, though, it seems important to consider span programs that are 
not strict. Unfortunately, going backwards, e.g., from 1(a) to 1(b), is probably difficult in general. 

Theorem 1.9 does not follow from the formula-evaluation techniques of [RS08], together with 
Theorem 1.4 from [Rei09a]. This tempting approach falls into intractable technical difficulties. In 
particular, the same span program can be used at two vertices v and w in <p only if g v = g w and 
the general adversary bounds of v's input subformulas are the same as those for w's inputs up 
to simultaneous scaling. In general, then, an almost-balanced formula will require an unbounded 
number of different span programs. However, the analysis in [RS08] loses a factor that depends 
badly on the individual span programs. Since the dependence is not continuous, even showing 
that the span programs in use all lie within a compact set would not be sufficient to obtain an 
0(1) upper bound. In contrast, the approach we follow here allows bounding the lost factor by an 
exponential in k, uniformly over different gate imbalances. 
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1.4 Quantum algorithm to evaluate approximately balanced AND-OR formulas 



Ambainis et al. [ACR + 07] use a weaker balance criterion for AND-OR formulas than Definition 1.8. 
They define an AND-OR formula to be approximately balanced if &-(<p) = 0(1) and c+(<£>) = 0{n). 
Here n is the size of the formula, i.e., the number of leaves, and cr-(cp) and cr+(ip) are defined by: 

Definition 1.10. For each vertex v in a formula if, let 

<J-(v) = max - - } - 

e ^Adv± ^ 

™ eC (1.10) 
°~+{ v ) = max Adv ± ((^ w ) 2 , 

with each maximum taken over all simple paths £ from v to a leaf. Let o~±((p) = o~±(r), where r is 
the root of (p. 

Recall that Adv ± (</?) = Adv(y?) = y/n for an AND-OR formula. Definition 1.8 is a stricter 
balance criterion because /^-balance of a formula <p implies (by Lemma 3.2) that cr_ (</?) and <7+ (</?) are 
both dominated by geometric series. However, the same steps followed by the proof of Theorem 1.9 
still suffice for proving the [ACR + 07] result, and, in fact, for strengthening it. We show: 

Theorem 1.11. Let if be an AND-OR formula of size n. Then after polynomial-time classical 
preprocessing that does not depend on the input x, (p(x) can be evaluated by a quantum algo- 
rithm with error at most 1/3 using 0(y/no~-(tp)^ input queries. The algorithm's running time is 
y/n a -((f) (log n)°^ assuming unit-cost coherent access to the preprocessed string. 

For the special case of AND-OR formulas with (J-(ip) = O(l), Theorem 1.11 strengthens The- 
orem 1.9. The requirement that o~-((p) = 0(1) allows for some gates in the formula to be very 
unbalanced. Theorem 1.11 also strengthens [ACR + 07, Theorem 1] because it does not require that 
o- + (tp) = 0{n). For example, a formula that is biased near the root, but balanced at greater depths 
can have cr-(ip) = 0(1) and o~+((f) = u(n). By substituting the bound 0-(<p) = 0{\fd) for a depth- 
d formula [ACR + 07, Def. 3], a corollary of Theorem 1.11 is that a depth-d, size-n AND-OR formula 
can be evaluated using 0(ynd) queries. This improves the depth-dependence from [ACR + 07], and 
matches the dependence from an earlier version of that article [Amb07]. 

The essential reason that the Definition 1.8 balance condition can be weakened is that for the 
specific gates AND and OR, by writing out the optimal span programs explicitly we can prove that 
they satisfy stronger properties than are necessarily true for other functions. 



2 Span programs 
2.1 Definitions 

We briefly recall some definitions from [Rei09a, Sec. 2]. Additionally, we define a span program 
complexity measure, the full witness size, that charges even for the "free" inputs. This quantity is 
important for developing quantum algorithms that are time efficient as well as query efficient. 

For a natural number n, let [n] = {1, 2, . . . , n}. For a finite set X, let C x be the inner product 
space C'^l with orthonormal basis {\x) : x G X}. For vector spaces V and W over C, let C(V, W) 
be the set of linear transformations from V into W, and let C(V) = C(V,V). For A G C(V, W), 
\\A\\ is the operator norm of A. For a string x G {0, l} n , let x denote its bitwise complement. 
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Definition 2.1 ([HLS05, HLS07]). For finite sets C, E and V C C n , let f : V E . An adversary 
matrix for f is a real, symmetric matrixT G C(CP) that satisfies (x\T\y) = whenever f(x) = f(y). 
The general adversary bound for f, with costs s G [0,oo) n , is 

Advf(f) = max ||r|| . (2.1) 

adversary matrices T: 
Vje[n], ||roAjj|<Sj 

Here T o Aj denotes the entry-wise matrix product between T and Aj = Ylxyx^y \ x )(y\- The 
(nonnegative-weight) adversary bound for f , with costs s, is defined by the same maximization, 
except with T restricted to have nonnegative entries. In particular, Adv^(/) > Adv s (/). 

Letting 1 = (1,1,..., 1), the adversary bound for / is Adv(/) = Advj(/) and the general 
adversary bound for / is Adv ± (/) = Adv^(/). By [HLS07] , Q(f) = 0(Adv ± (/)). 

Definition 2.2 (Span program [KW93]). A span program P consists of a natural number n, a 
finite- dimensional inner product space V over C, a "target" vector \t) G V , disjoint sets If ree and 
Ij,b for j G [n], b£ {0, 1}, and "input vectors" \vi) G V for i G I free U Uje{n], be {0,1} I if>- 
To P corresponds a function fp : {0, l} n — > {0, 1}, defined on x G {0, l} n by 

j p( s = f 1 if l*> e Span({|u i ) : i G Ifree U \J je[n] I jjXj }) ^ 
1 otherwise 

Some additional notation is convenient. Fix a span program P. Let I = /feUU^u fe6 | ^ Ijfi. 
Let A G £(C 7 , V) be given by ,4 = £\ e/ |«iX*l- For x e {°> let = J frcc U U j6[ „] ^ and 
n ( x ) = Ei 6 /(a>) l'X*l G ^( C/ )- Then = 1 if I*) G Range(^n(x)). A vector \w) G C 7 is said 

to be a witness for fp(x) = 1 if Il(x)\w) = \w) and A\w) = \t). A vector \w') G V is said to be a 
witness for f P {x) = if (%') = 1 and U{x)A^\w') = 0. 

Definition 2.3 (Witness size). Consider a span program P, and a vector s G [0, oo) n of nonnegative 
"costs." Let S = Eje[n] 6e{o 1} iel- b V^7l*X*l ^ ^(C 7 )- For each input x G {0,1}", define the witness 
size of P on x with costs s, wsize s (P, x), as follows: 

{™in\ w )-.An(x)\w)=\t) \\S\w}\\ 2 if fp(x) = 1 

min K):(tK>=1 \\SAt\w>)\\ 2 iff P (x) = ( 2 - 3 ) 
n(i)At|«)'}=o 

The witness size of P with costs s is 

wsize s (P) = max wsize s (P, x) . (2-4) 
xe{o,i} n 

Define the full witness size fwsize s (P) by letting S$ = S + Eie/ frcc l*X*l an d 

{^^\w):AU{x)\w)=\t){^ + II^HII 2 ) if fp(x) = 1 

min K>: {t{w , )=1 (\\\w f ) || 2 + U&iV) |H i/ / p(x) = ( 2 - 5 ) 
U(x)A^\w')=0 

fwsize s (P) = max fwsize s (P, x) . (2.6) 

a;e{0,l} n 

When the subscript s is omitted, the costs are taken to be uniform, s = 1 = (1,1,... ,1), e.g., 
fwsize(P) = fwsizej(P). The witness size is defined in [RS08]. The full witness size is defined 
in [Rei09a, Sec. 8], but is not named there. A strict span program has If ree = 0, so S* = S, and a 
monotone span program has Ijo = for all j [Rei09a, Def. 4.9]. 
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2.2 Quantum algorithm to evaluate a span program based on its full witness size 



[Rei09a, Theorem 9.3] gives a quantum query algorithm for evaluating span programs based on the 
full witness size. The algorithm is based on a quantum walk on a certain graph. Provided that the 
degree of the graph is not too large, it can actually be implemented efficiently 

Theorem 2.4 ([Rei09a, Theorem 9.3]). Let P be a span program. Then fp can be evaluated using 

T = 0(fwsize(P) || abs(^Gp)ll) (2-7) 

quantum queries, with error probability at most 1/3. Moreover, if the maximum degree of a vertex 
in Gp is d, then the time complexity of the algorithm for evaluating fp is at most a factor of 
(logd)(log(Tlogd)) ^ 1 ' > worse, after classical preprocessing and assuming constant-time coherent 
access to the preprocessed string. 

Proof sketch. The query complexity claim is actually slightly weaker than [Rei09a, Theorem 9.3], 
which allows the target vector to be scaled downward by a factor of y^fwsrze^P). 

The time-complexity claim will follow from the proof of [Rei09a, Theorem 9.3], in [Rei09a, 
Prop. 9.4, Theorem 9.5]. The algorithm for evaluating fp{x) uses a discrete-time quantum walk 
on the graph Gp{x). If the maximum degree of a vertex in Gp is d, then each coin reflec- 
tion can be implemented using O(logd) single-qubit unitaries and queries to the preprocessed 
string [GR02, CNW09] . Finally, the ( log(Tlog d))° {1) fact or comes from applying the Solovay- 
Kitaev Theorem [KSV02] to compile the single-qubit unitaries into products of elementary gates, 
to precision 1/0(T log d). □ 

We remark that together with [Rei09a, Theorem 3.1] , Theorem 2.4 gives a way of transforming a 
one-sided-error quantum algorithm into a span program, and back into a quantum algorithm, such 
that the time complexity is nearly preserved, after preprocessing. This is only a weak equivalence, 
because aside from requiring preprocessing the algorithm from Theorem 2.4 also has two-sided error. 
To some degree, though, it complements the equivalence results for best span program witness size 
and bounded-error quantum query complexity [Rei09a, Theorem 7.1, Theorem 9.2]. 



2.3 Direct-sum span program composition 

Let us study the full witness size of the direct-sum composition of span programs. We begin by 
recalling the definition of direct-sum composition. 

Let / : {0, l} n — > {0, 1} and SC [n]. For j £ [n], let rrij be a natural number, with mj = 1 for 
j i S. For j G S, let ft : {0, l} m > -» {0, 1}. Define y : {0, l} mi x • • • x {0, l} m " -» {0, l} n by 

/ n I fi(xj) if j £ S , 

Define g : {0, l}™ 1 x • • • x {0, l} m " -> {0, 1} by g{x) = f{y{x)). For example, if S = [n] \ {1}, then 

g(x) = f{x 1 J 2 (x 2 ),...Jn(x n )) . (2.9) 

Given span programs for the individual functions / and fj for j £ S, we will construct a span 
program for g. We remark that although we are here requiring that the inner functions fj act on 
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disjoint sets of bits, this assumption is not necessary for the definition. It simplifies the notation, 
though, for the cases S ^ [n], and will suffice for our applications. 

Let P be a span program computing fp = f. Let P have inner product space V, target vector 
\t) and input vectors \vi) indexed by If ree and Ij c for j G [n] and c G {0, 1}. 

For j G [n], let Sj G [0, oo) m -? be a vector of costs, and let s G [0, oo)^ m J be the concatenation of 
the vectors Sj. For j G S, let P j0 and P jl be span programs computing f P ji = fj : {0, l} m i — > {0, 1} 
and /pjo = -i/j, with rj = wsize Sj . (P j0 ) = wsize s . (P jl ). For c G {0, 1}, let P lc have inner product 
space V 3C with target vector \t 3 °) and input vectors indexed by I J f ee and Ij^j for k G [m^], 6 G {0, 1}. 
For j G" S, let rj = s^-. 

Let Is = UjeSce{o 1} fyc- Define q : Is — > [n] x {0, 1} by <j(i) = (j, c) if i G J JC . The idea is that 
<j maps i to the input span program that must evaluate to 1 in order for \v{) to be available in P. 

There are several ways of composing the span programs P and P 3C to obtain a span program Q 
computing the composed function fQ = g with wsize s (<5) < wsize r (P) [Rei09a, Defs. 4.4, 4.5, 4.6]. 
We focus on direct-sum composition. 

Definition 2.5 ([Rei09a, Def. 4.5]). The direct-sum-composed span program Q® is defined by: 

• The inner product space is V® = V © ®jeS,ce{0 i}(C /jc <8> V JC ). Any vector in V® can be 
uniquely expressed as \u) v + Sie/ S ^ ® \ u i)> where \u) G V and \m) G 

• TTie target vector is \t®) = \t) v . 

• T/ie /ree mpui vectors are indexed by J® ee = lf ree U/5U UjeSce{o i}(^i c x ^free) w tth, for 

\vi)y ifielhee 

\vi) v - \i) 8> |^' C ) if i G Jjc and jeS (2.10) 
ifi = {i',i")el jc xll c ee 

• T/ie other input vectors are indexed by I^f.\ b for j 6 [n], i 6 [ m j]> b G {0,1}. For j ^ S, 

J Si) fe = J ^> with k e > = \ v i)v f° r * e For i g s > ^ ^ fe) 6 = Uc6{o,i}( j jc >< 4i)- For 

i G Ijc and i! G 

= l*> ® l«i'> • ( 2 - n ) 

By [Rei09a, Theorem 4.3], /q® = g and wsize s (<5®) < wsize r (P). (While that theorem is stated 
only for the case S = [n], it is trivially extended to other S C [n].) We give a bound on how quickly 
the full witness size can grow relative to the witness size: 

Lemma 2.6. Under the above conditions, for each input x G {0, l}" 11 x---x{0,l} m ", with y = y{x), 



If g(x) = 1, let \w) be a witness to fp(y) = 1 such that J2je[n] i<=i jy r j\ w i\ 2 = wsize r (P, y). 
Then 

2 



fwsize s (Q®,x) , , , 1 + Ei 



wsize r (P, y) ' wsize r (P, y) 

( 1 u fwsize^CPgO 

where a (y Aw)) = max — - — - 

V 1 " jes-. wsize Sj (PW) 

3i £ Jjy. luit/i ^ 



(2.12) 
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If g( x ) = 0; let \w') be a witness to fp(y) = such that Ylje[n] ieJ- r j\( w '\ v i)\ = w size r (P, y). 
Then 

fwsize s (Q e ,x) , Ilk') II 2 

< v{y, \w )) + 



wsize r (P, y) ' wsize r (P, y) 

, fwsize Sj (PW) ( 2 - 13 ) 

where a(y, \ w ) = max / — - s . 

11 /; ie5: wBize aj (PW) 

3i £ Jj^ with (vi\w') ^ 

If S = 9, then a(y, \w}) and o~(y, \ w')) should each be taken to be 1 in the above equations. 

Proof. We follow the proof of [Rei09a, Theorem 4.3], except keeping track of the full witness size. 
Note that if S = 0, then Eqs. (2.12) and (2.13) are immediate by definition of fwsize s (Q® , x). 
Let I(y)' = I{y) \ I bcc = \J je[n] I jy .. 

In the first case, g(x) = 1, for j G 5 let \w^ y i) G C 1 ™ 3 be a witness to fpiv^Xj) = 1 such that 
fwsize s (PW , Xj) = 1 + £ . j Vi |u»f ^ | 2 + E, , , . ^ (si)fcl^f J f - As in [Rei09a, Theorem 4.3] , 

let |io®) G C /ffl ( x ) be given by 



to.- = < 



u;, if i G I{y) 

w ilW f ] if i = (*', i") with i' G I(y)' n /<?, i" G FW(x) (2.14) 
otherwise 



Then \w®) is a witness for fns>(x) = 1, and we compute 



fwsize s (Q®,3;) < 1 + ^ |u;®| 2 + ^ (s^)*!^'" 



i6/ f ® j6[n],Ae[mj], 



1+ E K| 2 + E ^|u»i| 2 (2.15) 

»Gif re e jG H ^S,i€lj x . 



ma 



+ E h 2 i+ E i-fi 2 + E (*i)*i<4' 

1+ E l u; «| 2 + E s j\ w i\ 2 + E |^i| 2 fwsize Sj (P J% , Xj) 



»G/f ree jG[n]\5,iG-f 7 ' a; . jeS,i&Ij y 



Eq. (2.12) follows using the bound fwsize s . {P 3y \Xj) < a(y, \w))rj for j G S, and Sj = Tj for j ^ 5. 

Next consider the case g(x) = 0. For j G S, let G be a witness for /pi% (xj) = with 

fwsize s (PW , Xj ) = \\\u™ )|| 2 + E, , , . M (sj)k\ (vi\uM ) | 2 . As in [Rei09a, Theorem 4.3] , let 

\u®) = \v/) v + Yl <«iK)l*>® l« s(i) > ■ ( 2 - 16 ) 
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Then \vP) is a witness for fn®(x) = 0, and, moreover, 

fwsize s (QV) < lll« e )H 2 + E (^')fclkV>| 2 

ie\n],ks[ mj ],iei^ k)Xxjrk 

= \\\n (B )f+ E s il(^l« e )| 2 + E (^K%> e >| 2 

je[n]\5 j65,fce[mj], 



k')H 2 + E *K>| 2 (2-17) 



je[n]xS 



+ E i^K>i 2 ( m^)ii 2 + E (^i^'i^>i 2 

'Ml 2 i . |/., |.„/\|2 i l/„. L..'\|2. 



+ E r i\( v i\ w ')\ 2 + E IH^I^size^P^^) 



Eq. (2.13) follows using the bound fwsize Sj . (_P J % ', Xj) < cr(y, |u/))rj for j £ S. □ 

Lemma 2.6 is a key step in the formula-evaluation results in this article and [Rei09b]. It 
is used to track the full witness size for span programs recursively composed in a direct-sum 
manner along a formula. The proof of Theorem 1.9 will require the lemma with the weaker 
bounds a(y,\w)),a(y,\w')) < maxy e 5 )Ce | ,i} fwsize s . (P JC )/wsize s .(.P J ' c ). Theorem 1.11 will use 
only the slightly stronger bounds a(y,\w)) < maxj g 5fwsize Sj (P :,y -')/wsize Si (P : ' y -''), a(y,\w')) < 
maxj g sfwsize s .(P- J ^)/wsize s .(i :,: '^'). However, the proof of [Rei09b, Theorem 1.1] will require the 
bounds of Eqs. (2.12) and (2.13). 



3 Evaluation of almost-balanced formulas 

In this section, we will apply the span program framework from [Rci09a] to prove Theorem 1.9. 
Our algorithm will be given by applying Theorem 2.4 to a certain span program. Before beginning 
the proof, though, we will give two necessary lemmas. 

Consider a span program P with corresponding weighted graph Gp, from [Rei09a, Def. 8.2]. 
We will need a bound on the operator norm of abs(AG P ), the entry- wise absolute value of the 
weighted adjacency matrix Ag Pv - If P is canonical [Rei09a, Def. 5.1], then we can indeed obtain 
such a bound in terms of the witness size of P: 

Lemma 3.1. Let s G (0,oo) fc , and let P be a canonical span program computing a function f : 
{0, l} k — > {0, 1} with input vectors indexed by the set I. Assume that for each x G {0, l} k with 
f(x) = 0, an optimal witness to fp(x) = is \x) itself. Then 

||absU Gp )||<2 fc (l + ™^) + |/| . (3.1) 
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Proof. Recall from [Rei09a, Def. 5.1], that P being in canonical form implies that its target vector 
is I*) = Y2x-f(x)=o and that the matrix A whose columns are the input vectors of P can be 
expressed as 

A = ^2\ v i)( i \ = E \x)(j,Xj\(g> (v xj \ . (3.2) 
iei je[k],x-.f(x)=o 

By assumption, for each x £ / (0), 

Sj || \v x j)\\ 2 = wsize s (P, x) < wsize s (P) . (3-3) 

In particular, letting a = min^rM Sj > 0, we can bound 



Si WW. 



J II I u xji 



wsize s (P) 



< 



a 



The rest of the argument follows from the definition of the weighted adjacency matrix Aq p - 
mi [Rei09a, Def. 8.1, Pr 
matrix corresponding to P, 



From [Rei09a, Def. 8.1, Prop. 8.8], || abs(Aa p ) II < || abs(^?c? J= ) 1 1 2 , where Bq p is the biadjacency 



Bg » = [o i) ' (3 - 5) 

and 1 is an |/| x |/| identity matrix. Now bound || abs(BQ p )\\ by its Frobenius norm: 

|2 



abs(A Gp )|| < ||abs(.B G , 
abs(5 Gj 

\t)f+ E iiMf+i'i 



< [|abs(S Gp )||J, 



x:f(x)=0, 



(3.6) 



<2 k + 2 k max V|Wf + |/| . 



x:/(*)=0 

Eq. (3.1) follows by substituting in Eq. (3.4). □ 

An important quantity in the proof of Theorem 1.9 will be o~-(ip), from Definition 1.10. For an 
almost-balanced formula ip, cr-{ip) = 0(1). 

Lemma 3.2. Consider a (5-balanced formula ip over a gate set S in which every gate depends on 
at least two input bits. Then for every vertex v, with children a, C2, • • • , Ck, 

Adv± (^) > /77X (37) 

max.Adv^.) " V V " [ ' 

In particular, 

a-{<p) < (2 + V2)p 2 . (3.8) 
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Proof. Consider a vertex v with corresponding gate g = g v : {0, l} fc — ► {0,1}. By Theorem 1.3, 
Adv ± ((^t,) = Adv^(g), where Sj = Adv ± (99 Cj ). It is immediate from the definitions that Adv^(g) > 
Adv s (g). We will show that Adv s (g) > yf\ + 1 / (5 2 (maxj sj), using that max,- s,-/ min.,- Sj < (3. 
Use the weighted minimax formulation of the adversary bound from [HLS07, Theorem 18]: 

Adv s (<?) = min max = , (3.9) 

P •<•■;/• {0.1 !< ^ vHrLDlh/d) 

where the minimization is over all choices of probability distributions p x over [k] for x S {0, l} fc . 

Since the adversary bound is monotone increasing in each weight, the worst case is when all 
but one of the weights are equal to maxj Sj/(3. Since for a scalar c, Adv cs (<7) = cAdv s (g), we may 
scale so that one weight is (3 and all other weights are 1. Assume that the first weight is si = 0; 
the other k — 1 cases, S2 = [3 and so on, are symmetrical. Assume also that g depends on the first 
bit; otherwise Adv^(<?) will not depend on s\ so one of the other cases will be worse. Therefore, 
there exist inputs x,y S {0, l} fc that differ only on the first bit, but for which g(x) ^ g(y)- 

Since the function g depends on at least two input bits, there also exists a third input z £ {0, l} k 
with x\ = z\ but g(z) = g(y) ^ g(x). Indeed, if g(z) = g(x) for every z with z\ = x\, and if 
g{z) = g{y) for every z with z\ = yi, then g depends only on the first bit. 

ByEq. (3.9), 

Adv±( 5 ) > min max I— == - — , — = =- — > (3.10) 

where the minimization is over only the three probability distributions p x , p y and p z . In the above 
expression, we may clearly take p y (l) = 1 and p y (j) = for j > 2. We may also use the Cauchy- 
Schwarz inequality to bound the second term above, and finally substitute s\ = f3, Sj = 1 for j > 2 
to obtain, 

Adv^(g) > min max { 7 == , — f= 1 | ■ (3.11) 



The optimum is achieved for p x (l) = /3 2 /(l + P 2 ), so Advf(g) > \J\ + (3 2 , as claimed. 

To derive Eq. (3.8), note that (3 > 1 necessarily. Then the sum o~-(tp) is dominated by the 
geometric series 

1 \ -k/2 



k=0 r 



(3.12) 



which is at most (2 + \^2)f3 2 , with equality at j3 = 1. □ 

Note that the 1-balanced formulas over S = {OR2} satisfy the inequality (3.7) with equality 
and come arbitrarily close to saturating the inequality (3.8). 

With Lemma 3.1 and Lemma 3.2 in hand, we are ready to prove Theorem 1.9. 

Proof of Theorem 1.9. First of all, we may assume without loss of generality that every gate in S 
depends on at least two input bits. Indeed, if a gate g : {0, l} k — > {0, 1} depends on no input 
bits, i.e., is the constant or constant 1 function, then g can be eliminated from any formula 
over S without changing the adversary balance condition, since Adv^(g) = for all cost vectors 
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s G [0,oo) fc . If a gate g : {0, l} fc — ► {0, 1} depends only on one input bit, say the first bit, then 
Adv^(g) = s\ for all cost vectors s, and therefore similarly g can be eliminated without affecting 
the adversary balance condition. 

Consider ip an n-variable, /3-balanced, read-once formula over the finite gate set S. Let r be the 
root of (p. We begin by recursively constructing a span program P„ that computes </? and has witness 
size wsize(P ¥ ,) = Adv^ (</?). P^ is constructed using direct-sum composition of span programs for 
each node in <p. (Direct-sum composition is also the composition method used in [RS08].) 

The construction works recursively, starting at the leaves of (p and moving toward the root. 
Consider an internal vertex v, with children c\,...,Ck- Let ctj = Adv =t ((/9 Cj . ), where (p Cj is the 
subformula of ip rooted at ca (Definition 1.8). In particular, if Cj is a leaf, then ay = 1. Assume 
that for j 6 [k] we have inductively constructed span programs P<p c and P^ c . computing tp c . 
and -i<p Cj , respectively, with wsize(P v , c .) = wsize(pj^. ) = ay. Apply [Rei09a, Theorem 6.1], a 
generalization of Theorem 1.5, twice to obtain span programs P v and Pj computing fp v = g v and 
/ t = ->g v , with wsize a (P„) = wsize a (Pj) = Adv^idv) = Ady ± (ip v ). 

Then let P Vv and P^ v be the direct-sum-composed span programs of P v and Pj, respectively, 
with the span programs P LPc . , P^ c . according to the formula (p. By definition of direct-sum compo- 
sition, the graph Gp is built by replacing the input edges of Gp with the graphs Gp or G p f ; 

and similarly for G p t ■ Some examples are given in [Rei09a, App. B] and in [RS08]. By [Rei09a, 

Theorem 4.3], P lfv (resp. P<J„) computes cp v (->(p v ) with wsize(P VPu ) = wsize(P^) = Adv ± ((/? t ,). 

Let P v = P Vr . We wish to apply Theorem 2.4 to P v to obtain a quantum algorithm, but to do so 
will need some more properties of the span programs P v and Pj. Recall from [Rei09a, Theorem 5.2] 
that each P v may be assumed to be in canonical form, satisfying in particular that for any input 
y £ {0, l} k with g v (y) = an optimal witness is \y) £ C 9v ^ itself. Therefore, Lemma 3.1 applies, 
and we obtain 

||abs(A Cp J||=2'(l + ^™) + |/| , (3.13) 

where |/| is the number of input vectors in P v . Now use 

wsize a (P v ) max.,- ctj Adv^(g v ) 
min,- ctj minj ctj m&Xj ctj (3-14) 

</3k , 

where we have applied Eq. (1.9) and also Adv^ (g v ) / maxj ctj < Adv^(g v ) < k. Additionally, 
by [Rei09a, Lemma 6.6], we may assume that |/| < 2k 2 2 k . Thus 

||abs(A Gp J||=/32°( fc ) . (3.15) 

By repeating this argument for the negated function ->g v computed by a dual span program Pj 
([Rei09a, Lemma 4.1]), we also have || abs(A G t )|| = P2°W. 
A consequence is that 

|| abs(^ Gp J|| = /32°( fcm -) (3.16) 

where /c max is the maximum fan- in of any gate used in <p. Indeed, Gp v is built by "plugging together" 
the graphs Gp and G p t for the different vertices v. Split the graph Gp into two pieces, Go and 
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Gi, comprising those subgraphs Gp and G p t for which the distance of v from r is even or odd, 

Ml 

respectively. Then || abs(-Acp )|| < II abs(^4(3 )|| + || abs(^G 1 )||. Since each Gb is the disconnected 
union of graphs Gp v and G p t, || abs^cj || < max„max{|| abs(i4(3 p )[|, || abs(^4 G t )||}- 

v P v 

Let us bound the full witness size of P v . 
Lemma 3.3. Let v be a vertex of (p. Then 

max |fwsize(P (/ , i; ),fwsize(P < J t) )} < o"_(w)Adv ± ((/9„) . (3-17) 

Proof. The proof is by induction in the maximum distance from v to a leaf. The base case, that all 
of u's inputs are themselves leaves is by definition of P v and Pj, since then o~-(v) = 1 + 1/Adv ± (^). 
Let v have children ci, . . . , c&. By Lemma 2.6 with s = 1 and S = {j S [k] : Cj is not a leaf}, 

fwsize(R,J 1 ffwsize(P ¥ , ) fwsize(pt) 

4- , , < 4T1 r + maX maX < T~ ) 3T7 T" 

Adv ± (^) Adv ± (^) jes [ Adv ± ( V 3 Cj ) Adv ± (( / 9 Cj .) 

In the case (f v (x) = 1, this follows since P v is strict, so in Eq. (2.12) the sum over If ree is zero. In 
the case <p v (x) = 0, this follows since P v is in canonical form, so in Eq. (2.13), UK) || = 1- 

Now by induction, the right-hand side is at most Adv^c/?,,)" 1 + max J& g cr_ ) = a-(y). □ 

In particular, applying Lemma 3.3 for the case v = r, we find 

fwsize(P^) < cj_(v3)Adv ± (v9) = 0(/3 2 Adv ± (^)) (3.19) 
since o~—((p) = 0(f3 2 ) by Lemma 3.2. Combining Eqs. (3.16) and (3.19) gives 

fwsize(P^) || abs(A Gp J|| = (3 3 2° (k ™^Adv ± {<p) . (3.20) 

This is 0(Adv ± (( / 9)); since the gate set S is fixed and finite, k max = O(l). Theorem 1.9 now follows 
from Theorem 2.4. □ 

Note that the lost constant in the theorem grows cubically in the balance parameter j3 and 
exponentially in the maximum fan-in fc max of a gate in S. It is conceivable that this exponential 
dependence can be improved. 

For future reference, we state separately the bound used above to derive Eq. (3.16). 

Lemma 3.4. If P v is the direct-sum composition along a formula tp of span programs P v and Pv, 
then 

||abs(A Gp )|| <2maxmax{||abs(A Gp J||,||abs(,4 G )||} . (3.21) 

If the span programs P v are monotone, then || abs(A Gp )|| < 2max t) || abs(AG Pv ) || . 

The claim for monotone span programs follows because then the dual span programs Pv are 
not used in P w . 
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4 Evaluation of approximately balanced AND-OR formulas 



The proof of Theorem 1.11 will again be a consequence of Lemma 2.6 and Theorem 2.4. 

We will use the following strict, monotone span programs for fan-in-two AND and OR gates: 

Definition 4.1. For s\,S2 > 0, define span programs Pand(si, S2) and Por(s\, S2) computing 
AND 2 and OR 2 , {0, l} 2 -> {0, 1}, respectively, by 



Pa.Nd(si,s 2 ) : 
Pqr{si,s 2 ) : 



\t) 



a 1 

OC2 



\t) = S, 



\ v i) 
\ v i) 



{2} and Iff, 



Both span programs have = {1}> h,i 
ctj , /3j , 5, €j , for j £ [2] , are given by 

ay = (sj/sp) 1 / 4 
5 = 1 



where s p = s\ + s 2 . Let a = \Ja\ + a 2 and e = \/ e i + e 2 



'1,0 



'2,0 



w = 1 1) («) 

|«2> = £2 (4.2) 
= 0. .ffere i/ie parameters 



& = 1 

e j = ( s j/ s p) ^ 



(4.3) 
(4.4) 



Note that a, e 6 (1, 2 1 / 4 ]. They are largest when s\ = s 2 - 
Claim 4.2. The span programs Pand(si, s 2 ) and Pqr(si, S2) satisfy: 



wsize (v ^ Tiv ^)(P A ND, x) 
wsize (v ^ Tiv ^ ) ( J P R,s) 



ifx € {11,10,01} 
j/ x = 00 

/s^ i/x G {00,10,01} 
/ i/x = 11 



(4.5) 



Proof. These are calculations using Definition 2.3 for the witness size. Letting a = {■y/si, ■y/sE) 
Q = Pand(si, 82) and R = Pqr(si,s 2 ), we have 



wsize a (Q, 11) 
wsize<y(Q, 00) 
and 
wsize (J (i?, 11) 
wsize (J (i?, 00) 



01 



— . /«2\ 2 r— 

CK2\ 2 1 



oi\2 1 



+ 



-1 



wsize CT ((5, 10) 
wsizeo-(Q, 01) 



+ 



£2\ 



5 J 



V*2 



wsize^P, 10) 
wsize^P, 01) 



(J\2 ^_ 

e2/ 



(4.6) 



(4.7) 



It is not a coincidence that wsize a (Q, x) = wsize CT (P, x) for all x £ {0, l} 2 . This can be seen as a 
consequence of De Morgan's laws and span program duality — see [Rei09a, Lemma 4.1]. □ 
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Proof of Theorem 1.11. Let p be an AND-OR formula of size n, i.e., on n input bits. 

First expand out the formula so that every AND gate and every OR gate has fan-in two. This 
expansion can be carried out without increasing cr_((/?) by more than a factor of 10: 

Lemma 4.3 ([ACR+07, Lemma 8]). For any AND-OR formula <p, one can efficiently construct 
an equivalent AND-OR formula iff of the same size, such that all gates in if' have fan-in at most 
two, and c_ ((/?') = 0(er_ ((/?)). 

Therefore we may assume that p is a formula over fan-in- two AND and OR gates. 

Now use direct-sum composition to compose the AND and OR gates according to the formula 
(f, as in the proof of Theorem 1.9. Since the span programs for AND and OR are monotone, direct- 
sum composition does not make use of dual span programs computing NAND or NOR. Therefore 
there is no need to specify these span programs. At a vertex v, set the weights s\ and S2 to equal 
the sizes of v's two input subformulas. Let P v be the span program used at vertex v, P lflv be the 
span program thus constructed for the subformula ip v , and P v be the span program constructed 
computing ip. With this choice of weights, it follows from Claim 4.2 and [Rei09a, Theorem 4.3] 
that wsize(P (/ , tl ) = Adv 1 * 1 (cp v ) = Adv(ip v ). 

Notice that for all Sl ,s 2 G [0,oo), || abs^^^j)!! = O(l) and || abs(A GpoR(si 32) )\\ = O(l). 
Therefore, by Lemma 3.4, we obtain that || abs(A<3 p ^)|| = 0(1). 

Thus to apply Theorem 2.4 we need only bound fwsize(P^). Lemma 3.3 does not apply, because 
for Pand(si, S2), an optimal witness \w'} to fp AND {x) = might have |||w/)|| 2 > 1, as each ay < 1. 
(Lemma 3.3 would apply had we set the parameters to be a.\ = a.2 = 1, Pj = {sp/sj) 1 ^ 4 , but then 
H-Agtp II would not necessarily be O(l).) However, analogous to Lemma 3.3, we will show: 

Lemma 4.4. Let v be a vertex of p. Then 

fwsize(P^z) < J<M«)Adv(^) ifip v {x) = 1 

V Vv ' ~ \2<r_(«)Adv(^)-l ifp v (x) = V ' 

Proof. The proof is by induction in the maximum distance from v to a leaf. The base case, that 
v's two inputs are themselves leaves is by definition of P v , since then o~-(v) = 1 + l/\/2. 

Let v have children c\ and C2. We will use Lemma 2.6 with s = 1, S = {j € [2] : Cj is not a leaf}. 

If ip v (x) = 1, then since P v is a strict span program, i.e., If ree = 0, Eq. (2.12) gives 

fw^e(P Vv ,x) 1 m J WSiZe{P ^ (49 ) 

Adv(<p„) Adv(v9,,) jes Adv(p Cj ) 

By induction, the right-hand side is at most 1/Adv(p v ) + maxj cr_(cj) = (T-(v). 

If p v {x) = and g v is an OR gate, then the unique witness \w') for P v has = 1, from 

Definition 4.1. From Eq. (2.13) and the induction hypothesis, 

fwsize(P^,x) 1 / 1 
jtt- — ; — < ; r + max 2<7_ (Co ) : — 

Adv ± (<^„) - Adv(^) j&s V Advfoy,. 
< 2<r_(u) ' 



Adv(ip. 
as claimed. 
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Therefore assume that (p v (x) = and g v is an AND gate. Let si and S2 be the sizes of the 
two input subformulas to v, s p = s\ + S2 = Adv(<^„) 2 , and assume without loss of generality 
that <p Cl (x) = 0. If ip C2 {x) = as well, then assume without loss of generality that 2cj_(ci) — 
> 2<r_(c2) — t^=, so a(y) < 2<t_(ci) — -^=. Then the witness \w') may be taken to be 

K) = (l/ai,0) = ((s p /si) 1/4 ,0). From Eq. (2.13), 



fwsize(P^ , x) y/s p /s 1 
Adv±(^) -Adv ± (^) + ' TW 



< -^+(2«7_( C1 )--^) (4.11) 



1 



< 2cr_(t;) 

as claimed. □ 
In particular, applying Lemma 4.4 for the case v = r, we find 

fwsize(P (p ) < 2<r_(^)Adv(99) = 2a_(9?)Vn . (4.12) 

Theorem 1.11 now follows from Theorem 2.4. □ 



5 Open problems 

In order to begin to relax the balance condition for general formulas, it seems that we need a better 
understanding of the canonical span programs. For example, can the norm bound Lemma 3.1 be 
improved? 

Although the two-sided bounded-error quantum query complexity of evaluating formulas is 
beginning to be understood, the zero-error quantum query complexity [BCdWZ99] appears to be 
more complicated. For example, the exact and zero-error quantum query complexities for OR ra 
are both n [BBC+01]. On the other hand, Ambainis et al. [ACGT09] use the [ACR+07] algorithm 
as a subroutine in the construction of a self-certifying, zero-error quantum algorithm that makes 
0{^/n\o^ n) queries to evaluate the balanced binary AND-OR formula. It is not known how to 
relax the balance requirement or extend the gate set. 

Can we develop further methods for constructing span programs with small full witness size, 
norm and maximum degree? A companion paper [Rei09b] studies reduced tensor-product span 
program composition in order to complement the direct-sum composition that we have used here. 

The case of formulas over non-boolean gates may be more complicated [Rei09a], but is still 
intriguing. 
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A Spectral gap for approximately balanced AND- OR formulas 

It is perhaps of interest to understand why [ACR + 07] imposes the unnecessary condition that 
cr+(<p) = 0(n). The proof in [ACR + 07] has two main cases, an eigenvalue-zero analysis of a 
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certain graph, and a small, nonzero eigenvalue analysis of the graph. The quantity cr+((p) appears 
only in the small-eigenvalue analysis, and not in the eigenvalue-zero analysis. However, [Rei09a, 
Theorem 8.7] states, roughly, that the small-eigenvalue analysis of a span program is unnecessary, 
as it follows from the eigenvalue-zero analysis. This provides a strong indication that the small- 
eigenvalue analysis in [ACR + 07] is overly conservative. However, this conclusion is not certain to 
be the case, since [Rei09a, Theorem 8.7] only shows the existence of an "effective" spectral gap 
based on the eigenvalue-zero analysis, whereas [ACR + 07] in fact proves an actual spectral gap. 

In this appendix, we therefore prove a stronger version of [ACR + 07, Lemma 5], that does 
not depend on o~+((p). This proof can be seen as an alternative, more direct way of proving 
Theorem 1.11, without relying on the span program framework. It confirms that the spectral gap 
is of the same size, up to constants, as the effective spectral gap from [Rei09a, Theorem 8.7]. 

Replacing [ACR + 07, Lemma 5] with Lemma A.l below gives an alternative proof of Theo- 
rem 1.11. We use the notation from [ACR + 07]. 

Lemma A.l. Let < E < l/(8o"_((/3) 3 A r ) 1 / 2 . For vertices v / r" in T, define y v by 

y^V^-^y, (A.i) 

where y v = Aa-(ip) 2 s v o~{v). Then for every vertex v ^ r" in T, having parent p, either a v = a p = 
0, or 

7\(v) = =>■ < (hpvdp) I a v < y v E ,^ _ 

7\(v) = 1 => > a v /(h pv a p ) > -y v E . 

Proof. The proof is by induction on the height of the tree. Recall that h pv = (s v / s p ) 1//4 . By [ACR + 07, 
Eq. (5)], we have the equation 

Ea v = h pv a p + h vc a c , (A. 3) 

c 

where the sum is over the children c of v. By [ACR + 07, Lemma 8], we may assume that v has at 
most two children. 

For a leaf v, f\(v) = and Ea v = h pv a p . Thus either a v = a p = 0, or 

^ = E < y v E . (A.4) 

a v 

This settles the base case of the induction argument. 

The induction proceeds as follows. First, consider the case that a v = 0. Then the induction 
hypothesis implies that a c = for every child c of v , whether A(v) is or 1. From Eq. (A. 3), then, 
indeed a p = 0. Assume now that a v ^ 0. Dividing Eq. (A. 3) by a v and simplifying, we find 

^ = E -Y,hl c ^ . (A.5) 

a> v n vc a v 

By the induction assumption, we have 

-a r \y c E if A(c) = 1 , . , 

- — - < <^ i _ ; (A.6) 
hvc(Xv l^S ifA(c) = 
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Substituting this bound into Eq. (A. 5) gives 

h ^<e+ y x [^y c E- y ,/^— 
av " C :a1?=i V ^ c: a1?=o V s * y* E 

= E i 1+ E r^r^i)- E (A - 7) 



c:A(c) = l v c:A(c)=0 

< i^fmax — (l + (max <r_(c)) — = — i /— — ^ 



c:A(c) = l c:A(c)=0 

In particular, using that X^c-a(c)=i Sc — Sv an< ^ = + max c 0"_(c), we obtain the bound 



■» a p < y £ _ 'y^ I ^£ 

Z"U — , ._ v ^"U 



c:A(c)=0 

Using instead "f c E 2 < 1/2 and max c cr_(c) < a -(if), we obtain the bound 

We will apply both Eq. (A.8) and Eq. (A.9) below. 

Now we consider two cases, depending on whether A(v) is or 1, i.e., on whether {c : A(c) = 0} 
is empty or not. 

If A(v) = 0, then all children of v evaluate to 1. Therefore, from Eq. (A.8), h pv a p /a v < y v E, as 
claimed. The induction hypothesis also gives from Eq. (A. 5) that h pv a p /a v > E > 0. 

If A(v) = 1, then there is a child c* with A(c*) = 0. Using that <y v E 2 < 1/2 and \J s v / s c *y c *E < 
y v E < l/y/2, Eq. (A.8) gives h pv a p /a v < 4= — \/2 < 0, so a p 7^ 0. We wish to argue that 
—a v /(h pv a p ) < y v E. Indeed, from Eq. (A.9), letting C = cr-(<p) and S\ = Y1 c -a(c)=i s c 5 

— a v 1 



h P vC1 P r^y,E ~ ^f(v / ^ + 5 'l) 

" ' r- r\w (A - 10) 
" I" (7c* + y|^7fe(v / ^ + Si))£ 2 ' 

where in the second step we have multiplied numerator and denominator by . / — y c *E, and applied 
the inequality < 1 _( 1 a+b ^ . The coefficient of E 2 in the denominator is bounded by 

7c * +AC 2 {^~ V + S 1 )= AC 2 {s c *o^{c*) + ^~ V + S l ) 

<4C 2 (s v a.(c*) + ^) (A.ll) 
< AC 2 s v a-(v) = 7„ , 

where we have used in the first inequality that s c * + S\ < s v . Hence —a v /(h pv a p ) < y v E. □ 
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